Boosting localized binary features for speech recognition

نویسندگان

  • Anindya Roy
  • Mathew Magimai-Doss
  • Sébastien Marcel
چکیده

In a recent work, the framework of Boosted Binary Features (BBF) was proposed for ASR. In this framework, a small set of localized binary-valued features are selected using the Discrete Adaboost algorithm. These features are then integrated into a standard HMM-based system using either single layer perceptrons (SLP) or multilayer perceptrons (MLP). The features were found to perform significantly better (when coupled with SLP) and equally well (when coupled with MLP) compared to MFCC features on the TIMIT phoneme recognition task. The current work presents an overview of the idea and extends it in two directions: 1) fusion of BBF with MFCC and an analysis of their complementarity, 2) scalability of the proposed features from phoneme recognition to the continuous speech recognition task and reusability on unseen data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting Localized Features for Speaker and Speech Recognition

In this thesis, we propose a novel approach for speaker and speech recognition involving localized, binary, data-driven features. The proposed approach is largely inspired by similar localized approaches in the computer vision domain. The success of these existing approaches coupled with their proven advantages of robustness and computational efficiency motivated us to apply these ideas to the ...

متن کامل

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Implementing Gender-Dependent Vowel-Level Analysis for Boosting Speech-Based Depression Recognition

Whilst studies on emotion recognition show that genderdependent analysis can improve emotion classification performance, the potential differences in the manifestation of depression between male and female speech have yet to be fully explored. This paper presents a qualitative analysis of phonetically aligned acoustic features to highlight differences in the manifestation of depression. Gender-...

متن کامل

Boosting Local Spectro-Temporal Features for Speech Analysis

We introduce the problem of phone classification in the context of speech recognition, and explore several sets of local spectro-temporal features that can be used for phone classification. In particular, we present some preliminary results for phone classification using two sets of features that are commonly used for object detection: Haar features and SVMclassified Histograms of Gradients (HoG).

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012